Context Sensitive Word Deletion Model for Statistical Machine Translation

نویسندگان

  • Qiang Li
  • Yaqian Han
  • Tong Xiao
  • Jingbo Zhu
چکیده

Word deletion (WD) errors can lead to poor comprehension of the meaning of source translated sentences in phrase-based statistical machine translation (SMT), and have a critical impact on the adequacy of the translation results generated by SMT systems. In this paper, first we classify the word deletion into two categories, wanted and unwanted word deletions. For these two kinds of word deletions, we propose a maximum entropy based word deletion model to improve the translation quality in phrase-based SMT. Our proposed model are based on features automatically learned from a real-word bitext. In our experiments on Chinese-to-English news and web translation tasks, the results show that our approach is capable of generating more adequate translations compared with the baseline system, and our proposed word deletion model yields a +0.99 BLEU improvement and a -2.20 TER reduction on the NIST machine translation evaluation corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک رتبه‌بند برای خطایاب معنایی با استفاده از ویژگی‌های حساس به متن

Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...

متن کامل

Discourse-aware Statistical Machine Translation as a Context-sensitive Spell Checker

Real-word errors or context sensitive spelling errors, are misspelled words that have been wrongly converted into another word of vocabulary. One way to detect and correct real-word errors is using Statistical Machine Translation (SMT), which translates a text containing some real-word errors into a correct text of the same language. In this paper, we improve the results of mentioned SMT system...

متن کامل

Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation

Most current statistical machine translation (SMT) systems make very little use of contextual information to select a translation candidate for a given input language phrase. However, despite evidence that rich context features are useful in stand-alone translation disambiguation tasks, recent studies reported that incorporating context-rich approaches from Word Sense Disambiguation (WSD) metho...

متن کامل

Statistical Machine Translation with Local Language Models

Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models ...

متن کامل

Word Sense Disambiguation for Statistical Machine Translation

While much effort has been put in designing and evaluating Word Sense Disambiguation (WSD) models for translation in the WSD community, standard Statistical Machine Translation (SMT) systems have achieved remarkable improvements in translation quality without modeling WSD explicitly. However, inspecting SMT output suggests that SMT needs better semantic modeling to accurately translate meaning....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017